Reactive Policies with Planning for Action Languages

نویسندگان

  • Zeynep Gozen Saribatur
  • Thomas Eiter
چکیده

We describe a representation in a high-level transition system for policies that express a reactive behavior for the agent. We consider a target decision component that figures out what to do next and an (online) planning capability to compute the plans needed to reach these targets. Our representation allows one to analyze the flow of executing the given reactive policy, and to determine whether it works as expected. Additionally, the flexibility of the representation opens a range of possibilities for designing behaviors. Autonomous agents are systems that decide for themselves what to do to satisfy their design objectives. These agents have a knowledge base that describes their capabilities, represents facts about the world and helps them in reasoning about their course of actions. A reactive agent interacts with its environment. It perceives the current state of the world through sensors, consults its memory (if there is any), reasons about actions to take and executes them in the environment. A policy for these agents gives guidelines to follow during their interaction with the environment. As autonomous systems become more common in our lives, the issue of verifying that they behave as intended becomes more important. During the operation of an agent, one would want to be sure that by following the designed policy, the agent will achieve the desired results. It would be highly costly, time consuming and sometimes even fatal to realize at runtime that the designed policy of the agent does not provide the expected properties. For example, in search and rescue scenarios, an agent needs to find a missing person in unknown environments. A naive approach would be to directly try to find a plan that achieves the main goal of finding the person. However, this problem easily becomes troublesome, since not knowing the environment causes the planner to consider all possible cases and find a plan that guarantees reaching the goal in all settings. Alternatively, one can describe a reactive policy for the agent that determines its course of actions according to its current knowledge, and guides the agent in the environment towards the main goal. A possible such policy could be “always ∗This work has been supported by Austrian Science Fund (FWF) project W1255-N23. Copyright c © 2016, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. move to the farthest unvisited point in visible distance, until a person is found”. Following this reactive policy, the agent would traverse the environment by choosing its actions to reach the farthest possible point from the current state, and by reiterating the decision process after reaching a new state. The agent may also remember the locations it has been in and gain information (e.g. obstacle locations) through its sensors on the way. Verifying beforehand whether or not the designed policy of the agent satisfies the desired goal (e.g. can the agent always find the person?), in all possible instances of the environment is nontrivial. Action languages (Gelfond and Lifschitz 1998) provide a useful framework on defining actions and reasoning about them, by modeling dynamic systems as transition systems. Their declarative property helps in describing the system in an understandable, concise language, and they also address the problems encountered when reasoning about actions. By design, these languages are made to be decidable, which ensures reliable descriptions of dynamic systems. As these languages are closely related with classical logic and answer set programming (ASP) (Lifschitz 2008; 1999), they can be translated into logic programs and queried for computation. The programs produced by such translations can yield sound and complete answers to such queries. There have been various works on action languages (Gelfond and Lifschitz 1998; 1993; Giunchiglia and Lifschitz 1998) and their reasoning systems (Giunchiglia et al. 2004; Gebser, Grote, and Schaub 2010), with underlying mechanisms that rely on SAT and ASP solvers. The shortage of representations that are capable of modeling reactive policies prevents one from verifying such policies using action languages as above before putting them into use. The necessity of such a verification capability motivates us to address this issue. We thus aim for a general model that allows for verifying the reactive behavior of agents in environments with different types in terms of observability and determinism. In that model, we want to use the representation power of the transition systems described by action languages and combine components that are efficient for describing reactivity. Towards this aim, we consider in this paper agents with a reactive behavior that decide their course of actions by determining targets to achieve during their interaction with the environment. Such agents come with an (online) planning ar X iv :1 60 3. 09 49 5v 1 [ cs .A I] 3 1 M ar 2 01 6 capability that computes plans to reach the targets. This method matches the observe-think-act cycle of Kowalski and Sadri (1999), but involves a planner that considers targets. The flexibility in the two components target development and external planning allow for a range of possibilities for designing behaviors. For example, one can use HEX (Eiter et al. 2005) to describe a program that determines a target given the current state of an agent, finds the respective plan and the execution schedule. ACTHEX programs (Fink et al. 2013), in particular, provide the tools to define such reactive behaviors as it allows for iterative evaluation of the logic programs and the ability to observe the outcomes of executing the actions in the environment. Specifically, we make the following contributions: (1) We introduce a novel framework for describing the semantics of a policy that follows a reactive behavior, by integrating components of target establishment and online planning. The purpose of this work is not synthesis, but to lay foundations for verification of behaviors of (human-designed) reactive policies. The outsourced planning might also lend itself for modular, hierarchic planning, where macro actions (expressed as targets) are turned into a plan of micro actions. Furthermore, outsourced planning may also be exploited to abstract from correct sub-behaviors (e.g. going always to the farthest point). (2) We relate this to action languages and discuss possibilities for policy formulation. In particular, we consider the action language C (Giunchiglia and Lifschitz 1998) to illustrate an application. The remainder of this paper is organized as follows. After some preliminaries, we present a running example and then the general framework for modeling policies with planning. After that, we consider the relation to action languages, and as a particular application we consider (a fragment of) the action language C. We briefly discuss some related work and conclude with some issues for ongoing and future work. Preliminaries Definition 1. A transition system T is defined as T = 〈S, S0,A,Φ〉 where • S is the set of states. • S0 ⊆ S is the set of possible initial states. • A is the set of possible actions. • Φ : S × A → 2 is the transition function, returns the set of possible successor states after applying a possible action in the current state. For any states s, s′ ∈ S, we say that there is a trajectory between s and s′, denoted by s→σ s′ for some action sequence σ = a1, . . . , an where n ≥ 0, if there exist s0, . . . , sn ∈ S such that s = s0, s′ = sn and si+1 ∈ Φ(si, ai+1) for all 0 ≤ i < n. We will refer to this transition system as the original transition system. The constituents S and A are assumed to be finite in the rest of the paper. Note that, this transition system represents fully observable settings. Large environments cause high number of possibilities for states, which cause the transition systems to be large. Especially, if the environment is nondeterministic, the resulting transition system contains high amount of transitions between states, since one needs to consider all possible outcomes of executing an action.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reactive Policy Checking for Action Languages

As autonomous systems become more common in our lives, the issue of verifying that they behave as intended and that their design policies are correct becomes more important. This thesis aims to build foundations for such a verification capability for policies with a reactive behavior, with a focus on combining the representation power of action languages with model checking techniques.

متن کامل

Minority Language Policy and Planning in the Micro Context of the City: The Case of Manchester

This paper investigates service provisions in community languages offered by Manchester City Council and agencies working alongside to find out whether there is an explicit language policy in Manchester, how such a policy is formulated, how it functions, and how it is reflected in education. Data was collected through interviews with different personnel in MCC, focus group discussions with comm...

متن کامل

Intersectoral Planning for Public Health: Dilemmas and Challenges

Background Intersectoral action is often presented as essential in the promotion of population health and health equity. In Norway, national public health policies are based on the Health in All Policies (HiAP) approach that promotes whole-of-government responsibility. As part of the promotion of this intersectoral responsibility, p...

متن کامل

Discrepancy Search with Reactive Policies for Planning

We consider a novel use of mostly-correct reactive policies. In classical planning, reactive policy learning approaches could find good policies from solved trajectories of small problems and such policies have been successfully applied to larger problems of the target domains. Often, due to the inductive nature, the learned reactive policies are mostly correct but commit errors on some portion...

متن کامل

Object and Action Naming: A Study on Persian-Speaking Children

Objectives: Nouns and verbs are the central conceptual linguistic units of language acquisition in all human languages. While the noun-bias hypothesis claims that nouns have a privilege in children&rsquo;s lexical development across languages, studies on Mandarin and Korean and other languages have challenged this view. More recent cross-linguistic naming studies on children in German, Turkish,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016